8bitfiles.net/archives

home *** CD-ROM | disk | FTP | other *** search

/ 8bitfiles.net/archives / archives.tar / archives / compuserve-file-archive / 10 Education Games / STATIS.DOC < prev next >

Wrap

Text File | 2019-04-13 | 20KB | 489 lines

MINISTAT: A Statistical Package for the Commodore 64 Copyright 1989 by Jon Rich, Ph.D. MINISTAT is a statistical package program which performs both univariate and bivariate inferential and descriptive statistics. A particularly useful feature of this package is that the data need to be entered only once. Once the data file has been set up, one may perform any of the included statistical tests on any of the variables. A MINISTAT data file is a two dimensional array, or table, of data. One dimension is the variables. These may be subject characteristics, such as sex or race, subject measurements, such as height, test scores, or running speed, or any other characteristic on which subjects vary. The other dimension is the cases, or subject number. Data for a typical MINISTAT file is shown below: Variables SEX RACE HT. WT1 WT2 1 1 1 68 143 140 Case 2 2 2 60 105 103 No. 3 1 3 69 162 153 4 1 2 70 168 160 5 2 3 63 115 118 6 2 1 65 123 125 7 1 2 69 149 147 8 2 2 67 145 140 9 1 3 67 123 119 10 2 1 64 122 114 These data are from a test of a weight-loss diet. For each of the ten persons in the test, the researcher has recorded the sex (1=male, 2=female), the race (1=Black, 2=White, 3=Oriental), the height in inches, and the weight before (WT1) and after (WT2) the diet. Using MINISTAT, we can answer a number of questions about these data. STARTING THE PROGRAM Start the program by entering LOAD "MINISTAT",8 and then entering RUN. At the title screen you will be given the opportunity to toggle the color between black on white and white on blue by pressing the space bar. Choose the color combination which is easiest to read, and then proceed to the main menu by pressing "C". You will then see the main menu, which looks like this: SELECT A) SAVE 1) DESC B) INFO 2) FREQ C) OLD 3) REGR D) DIR 4) CHI2 E) NEW 5) T:UR F) KILL 6) T:RS G) COMP 7) ALPHA H) HELP At this point, there is no data file loaded into the program. The only options on the menu which will work are "C", which will allow you to retrieve a previously saved file, "D", which will display the catalog of previously saved MINISTAT files, "E", which will allow you to input a new file, "F", which will erase a previously saved file, and "H", which will allow you to view the help files. SETTING UP THE FILE (Option "E") To set up a new file, hit "E" at the main menu. You will be asked for a file name. This can be anything you wish that you can easily associate with your study. We will name this file "DIET." Next you are asked "N VARS?" This means "How many variables are in the file?" In our example there are five variables, so we enter the number 5. We are then asked for N, and we enter 10, meaning there are ten subjects in the study. N must be from 2 to 100, and the number of variables must be from one to 30. Next, MINISTAT asks, NAMES (y or n)?. This means, "Would you like to name the variables?" If we press N, indicating No, MINISTAT will assign the variables the names V1, V2, etc., and go straight into the data entry section. If we press Y, we will be given an opportunity to assign our own names. For our example, we will press Y. MINISTAT then asks NAME1? and we enter SEX, the name of our first variable. After NAME2 we input RACE, after NAME3 HT., and so on. Once the file characteristics have been input, we are ready to input the actual data. MINISTAT will ask for the first case of the first variable, continuing down through every case of the first variable, and then go on to subsequent variables. For example, MINISTAT will initially print SEX - CASE 1?, and we will enter a 1, indicating that the sex of the first subject is male. If we make a mistake, we can back up by simply pressing ENTER. The data entry might look like this: SEX -- CASE 1: 1 SEX -- CASE 2: 2 SEX -- CASE 3: 3 (this value is a mistake) SEX -- CASE 4: (enter, we back up) SEX -- CASE 3: 1 SEX -- CASE 4: 1 * * * WT2 -- CASE 9: 119 WT2 -- CASE 10:114 SELECTING A PROCEDURE After the data have been entered, the other options in the menu become available. Letters (A through H) select utility procedures; numbers (1 through 7) select statistical procedures. A procedure is selected by simply pressing the corresponding number or letter -- you do not need to press enter. Procedures that require that a variable be selected will produce a prompt mark: >?. This mark indicates that a variable name should be entered. Some procedures require that more than one variable be entered and will produce this mark again until all variables have been entered. If you input an unrecognized name, two question marks will be printed. After a procedure has been executed, you will be asked, if appropriate, AGAIN (y or n)?. If you would like to perform the same procedure with different variables or parameters, type Y. If you want to return to the main menu, type N. Detailed descriptions of each procedure are listed below. 1) DESC This procedure generates descriptive statistics for any of the variables. If we enter variable WT1, the description looks like this: MEAN: 135.5 VAR: 393.25 S.D.: 19.830532 S.E.: 6.27096 SUM: 1355 N: 10 MAX: 168 MIN: 105 Here is what each of these statistics means: N: The total number of subjects in the sample. SUM: The sum of all the scores or measurements. MEAN: This is the average value, the sum divided by N. MAX, MIN: The maximum and minimum. The heaviest person in this sample weighed 168 lbs., the lightest 105 lbs. VAR: This is the variance of the sample -- to what degree the scores are spread out or clustered together. S.D.: The standard deviation, which is the square root of the variance. In large samples, about 68% of the scores will fall within one standard deviation of the mean, 95% within two standard deviations. S.E.: This is the standard error of means, which is the standard deviation divided by the square root of N. This is the standard deviation of the means of all possible samples of size N. 2) FREQ This procedure generates a histogram or bargraph. It shows how many subjects fall within each of a number of consecutive values or value ranges of a variable. The program first asks for a value name, and then for an interval size. Choose an interval size which is a fraction of the total range, but at least equal to the unit of measurement. Using an interval size of 2, height is distributed like this: 60 ******* (1) 62 ******* (1) 64 ************** (2) 66 ************** (2) 68 ********************* (3) 70 ******* (1) The top bar shows that there is one subject who is at least 60 inches but is shorter than 62 inches. We can see that the modal interval, the one with the most subjects, is the one with subjects who are at least 68 inches tall, but shorter than 70 inches. 3) REGR This procedure generates a scattergram, a regression equation, a correlation coefficient, and a t-value with associated degrees of freedom. All of these statistics allow us to examine the relationship between two variables. The scattergram is a plot of the values of one variable against the values of another. A strong positive relationship, as one might expect to find between variables such as height and weight or job prestige and income, will show all of the points tightly clustered in a straight line going from the lower left to the upper right. A weak relationship, such as that between nose length and IQ, would show points scattered about in a more or less random fashion. A strongly negative relationship, as one might find between blood alcohol levels and performance on a driving test, would show points clustered tightly from the upper left down to the lower right. The first variable entered is the X variable, shown along the bottom of the graph. The second variable entered is the criterion or Y variable, and is shown along the side. The regression equation is shown below the scattergram. This is the formula which does the best job of predicting the Y variable from the X variable. The correlation coefficient (R) quantifies the degree of relationship between the two variables. The value of R can range from -1, a perfect negative relationship, through zero, no relationship, to +1, a perfect positive relationship. The t-value along with the degrees of freedom allows one to test if the relationship is strong enough to be generalized beyond the sample to the population in general. The P value shows the level of significance for the t-value, that is, the likelihood that the results are due only to chance and do not reflect a real effect. A P of less than .05 is generally thought of as significant. If we enter height (HT.) as our first variable, and weight before the diet as our second variable (WT1), we get these results: WT1 = (6.094*HT.) + -267.906 R=0.92 T=6.631 DF=8 P<.001 The regression formula provides a way to predict weight, given a person's height. If someone is five feet, or 60 inches tall, we could predict that they would weigh (6.094*60)-267.906, or 97.7 pounds. The R of .92 is relatively high; it shows us that the relationship is strongly positive, and that we can predict one variable from the other with relatively little error. The t, df, and p values can tell us whether the R is high enough to be generalized to the population from which we drew our sample, or whether it might be a fluke found in this particular sample. P<.001 means that there is less than one chance in 1000 that there is no correlation between height and weight in the population. 4) CHI2 This procedure gives a chi-square value, the associated degrees of freedom, and a contingency table. A chi-square is a measure of association between two variables with nominal level data. Data is called nominal level when it is used only to designate groups, not as scores or rankings. Zip-codes an example of nominal level data. In our example, SEX and RACE are nominal level variables. By looking at the association between SEX and RACE, we can determine whether the ratio of males to females differs significantly according to race. The run looks like this: >?SEX >?RACE SEX RACE FREQ FR EXP 1 1 1 1.5 1 2 2 2 1 3 2 1.5 2 1 2 1.5 2 2 2 2 2 3 1 1.5 CHI2=0.667 DF=2 N.S. The chi-square value which was derived is below the level needed for significance at the .05 level. This is indicated by the notation "N.S." which means "not significant. The above results indicate that the proportion of males to females does not differ significantly among the three different races in our sample. If there were a significant relationship, instead of "N.S.," we would see "P<.05." 5) T:US This procedure performs a t-test for unrelated samples. It reports the mean, standard deviation, and N of each group, as well as the pooled standard error, the t-value, degrees of freedom, and p or probability value. In this procedure, a criterion variable is split into two groups, and the mean of the two groups is compared. Any of the other variables can be used as grouping variables. The grouping variable is entered first, then the criterion variable. Finally, two value ranges are specified for the grouping variable. Subjects who fall into the first range are designated as "LEVEL 1", those falling within the second range are "LEVEL 2." Suppose we would like to see if males are, on the average, different in height from females. The run would look like this: >?SEX >?HEIGHT L1,H1: 1,1 L2,H2: 2,2 LEVEL 1 LEVEL 2 MEAN 68.6 63.8 SDEV 1.02 2.315 N 5 5 -------------------------------- S.E.=1.265 T=3.795 DF=8 P=5E-03 SEX is the variable which defines the two groups, or levels, and so it was entered first. HT. is the criterion variable, and was entered second. In response to the "L1, H1:," we entered 1,1. Entering these ones indicated that the first group of subjects in which we are interested ranges from one to one, inclusive, on the variable "SEX." This is all the males. The second group ranges from two to two, and includes all of the females. Looking at our results, we see that males are, on the average, 68.6 inches tall, on just under 5'9". The females are just under 5'4". Our p-value is 5E-03, which is 5 times 10 to the -3 power, or .005. Since this is less than the conventional .05 level of significance, we can say that males are taller than females in the population from which our sample was drawn. 6) T:RS This procedure performs a t-test on related samples. This test is also called a matched-pairs or repeated-measures t-test. The procedure provides a t-value, associated degrees of freedom, and p or probability level. The t-value is positive if the first variable entered has a large mean; it is negative if the second variable has a large mean. Related-samples means that the scores are expected to be correlated, and can reasonably be analyzed in pairs. Such is the case when the same subjects are exposed to two different experimental conditions, or when some measure is taken before and after a certain treatment. By analyzing the difference between pairs of scores instead of groups of scores, the test becomes more sensitive, and significant results become more easily obtained. In these data, perhaps the most interesting question is whether the subjects weighed significantly less after the diet than they did before it. The run would look like this: >? WT1 >? WT2 MEAN 135.5 131.9 SDEV 19.831 17.768 T=2.785 DF=9 P=.021 Since our p-value of .021 suggests that such results would be rare (obtained only 21 out of 1000 times) with an ineffective diet, we can conclude that the diet would be effective if used by others in the population from which our sample was drawn. 7) ALPHA Note: This function is not available in the public domain version of MINISTAT. This procedure calculates coefficient alpha, a measure of the internal consistency and reliability of a test. The procedure asks first for number of items. Enter the number of test items for which you will be assessing reliability. It then prompts you for variables, which are the names of each test item. The output shows coefficient alpha, and the correlation of each item with the sum of all other test items. This allows you to judge which items are inconsistent with the rest of the test, and which you should consider disgarding to increase the test's reliability. ***** UTILITIES A) SAVE This function allows you to save your current file to the disk, so that you can reanalyze the data at a later time. It asks if you want to change the file name, so if the file has been modified, the new file can be saved without erasing the original file. B) INFO This command allows you to view information about the current file. It will show the file name, number of observations (N), and the variable names. C) OLD This command retrieves a file that had been previously created with the E) NEW command. When using this command, if there is already a data file in memory it will be erased. To prevent this from accidental file loss, the program asks if you are sure that you want to load a new file. D) DIR This command lists the data files on the current disk. It will list only data files created by this program. E) NEW This command allows you to create a new data file. See the section under "SETTING UP THE FILE" in this document for more details. F) KILL This command can be used to delete any files that have been created by MINISTAT. Enter the name of the file at the prompt, and you will either be told that the file has been killed, or that the file can not be found. G) COMP This option allows you to transform a variable to create a new variable. For instance, we might want to convert the WT1 variable in our DIET file from pounds to kilograms. We can do this by multiplying by 2.2. When we enter the COMP procedure, we are asked "CONSTANT OR VARIABLE?" Type C for CONSTANT if you are going to transform your variable with a constant, type V for VARIABLE if you are going to use another variable to transform it. In our example, we will type C, since we are using a constant to transform pounds to kilograms. Next, input the variable or constant which will be used -- in our case, 2.2. The next step is to select the operation, i.e., addition, multiplication, etc. We are going to multiply, so we press "*". Then we enter the variable to be transformed, WT1. The program shows us the transformation equation, 2.2*WT1. We are then asked for the name of the new variable, the one we have created by the transformation. We can either write over an old variable, or we can create a new one. For this example, we will pick the name WT3. The screen will look like this after the transformation: VARIABLE OR CONSTANT? CONSTANT ? 2.2 SELECT OPERATION: * >? WT1 2.2*WT1 NEW VARIABLE>? WT3 ?? COMPUTATION COMPLETED You can use the INFO procedure to reassure yourself that the new variable is there. H) HELP Note: This procedure not available in public domain version. HELP will bring you to a help menu very similar to the main menu. Requesting any procedure while in the HELP area will give a single screen describing how to use the procedure. The <at> key will return you to the main menu. ***** PRINTING To print out a screen, first turn on your printer, and then hit the <F1> key. ****************************************************** * * * SHAREWARE MESSAGE * * * ****************************************************** MINISTAT is a shareware program -- that is, try it out first, and if you find it useful & expect to continue using it, you pay a shareware fee -- a fee which is generally much lower than what you would pay for equivalent commercially available software. There are two options for meeting your shareware obligation with MINISTAT: 1) Send $10.00 and you will receive aprinted copy of this manual, along with a disk containing an enhanced version of the program. This version includes a full HELP menu and an additional statistical procedure (see ALPHA above). As a registered user, you will be notified of program updates, which you will be able to receive if you send me a blank disk and a self-addressed stamped envelope. 2) Actually, the $10.00 above barely covers expenses, and will certainly not make me rich. Like most shareware authors (I think), the real satisfaction comes from knowing that people are using and enjoying the software. So here is option #2: your shareware obligation can be met completely just by COMMENTING on the software. Send me a note by mail or CIS e-mail to let me know how you are using the program, what you like, and what you found confusing. You can also send any recommendations for improvement. If you choose to make a contribution of less than $10, you are welcome to do that also, of course. I suspect we all suffer from at least some "shareware guilt," from all those programs we have downloaded and not yet paid for -- so with the two options above you can easily rid yourself of some of this destructive emotion at low cost or at no cost. Thanks for your interest -- have fun with MINISTAT! --Jon Rich, Ph.D. 23212-6 Orange Ave. El Toro, CA 92630-6918 CIS 73367,1326